23 research outputs found
Weighted linear fusion of multimodal data - a reasonable baseline?
The ever-increasing demand for reliable inference capable of handling unpredictable challenges of practical application in the real world, has made research on information fusion of major importance. There are few fields of application and research where this is more evident than in the sphere of multimedia which by its very nature inherently involves the use of multiple modalities, be it for learning, prediction, or human-computer interaction, say. In the development of the most common type, score-level fusion algorithms,it is virtually without an exception desirable to have as a reference starting point a simple and universally sound baseline benchmark which newly developed approaches can be compared to. One of the most pervasively used methods is that of weighted linear fusion.It has cemented itself as the default off-the-shelf baseline owing to its simplicity of implementation, interpretability, and surprisingly competitive performance across a wide range of application domains and information source types. In this paper I argue that despite this track record, weighted linear fusion is not a good baseline on the grounds that there is an equally simple and interpretable alternative – namely quadratic mean-based fusion – which is theoretically more principled and which is more successful in practice. I argue the former from first principles and demonstrate the latter using a series of experiments on a diverse set of fusion problems: computer vision-based object recognition, arrhythmia detection, and fatality prediction in motor vehicle accidents.Postprin
GhostVLAD for set-based face recognition
The objective of this paper is to learn a compact representation of image
sets for template-based face recognition. We make the following contributions:
first, we propose a network architecture which aggregates and embeds the face
descriptors produced by deep convolutional neural networks into a compact
fixed-length representation. This compact representation requires minimal
memory storage and enables efficient similarity computation. Second, we propose
a novel GhostVLAD layer that includes {\em ghost clusters}, that do not
contribute to the aggregation. We show that a quality weighting on the input
faces emerges automatically such that informative images contribute more than
those with low quality, and that the ghost clusters enhance the network's
ability to deal with poor quality images. Third, we explore how input feature
dimension, number of clusters and different training techniques affect the
recognition performance. Given this analysis, we train a network that far
exceeds the state-of-the-art on the IJB-B face recognition dataset. This is
currently one of the most challenging public benchmarks, and we surpass the
state-of-the-art on both the identification and verification protocols.Comment: Accepted by ACCV 201
Ancient Roman coin retrieval : a systematic examination of the effects of coin grade
Ancient coins are historical artefacts of great significance which attract the interest of scholars, and a large and growing number of amateur collectors. Computer vision based analysis and retrieval of ancient coins holds much promise in this realm, and has been the subject of an increasing amount of research. The present work is in great part motivated by the lack of systematic evaluation of the existing methods in the context of coin grade which is one of the key challenges both to humans and automatic methods. We describe a series of methods – some being adopted from previous work and others as extensions thereof – and perform the first thorough analysis to date.Postprin
Face recognition from video
In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence "reillumination" algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature. © Springer-Verlag Berlin Heidelberg 2006
Achieving robust face recognition from video by combining a weak photometric model and a learnt generic face invariant
In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. The central contribution is an illumination invariant, which we show to be suitable for recognition from video of loosely constrained head motion. In particular there are three contributions: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation to exploit the proposed invariant and generalize in the presence of extreme illumination changes; (ii) we introduce a video sequence re-illumination algorithm to achieve fine alignment of two video sequences; and (iii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve robustness to unseen head poses. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 323 individuals and 1474 video sequences with extreme illumination, pose and head motion variation. Our system consistently achieved a nearly perfect recognition rate (over 99.7% on all four databases). © 2012 Elsevier Ltd All rights reserved
Discriminative extended canonical correlation analysis for pattern set matching
In this paper we address the problem of matching sets of vectors embedded in
the same input space. We propose an approach which is motivated by canonical
correlation analysis (CCA), a statistical technique which has proven successful
in a wide variety of pattern recognition problems. Like CCA when applied to the
matching of sets, our extended canonical correlation analysis (E-CCA) aims to
extract the most similar modes of variability within two sets. Our first major
contribution is the formulation of a principled framework for robust inference
of such modes from data in the presence of uncertainty associated with noise
and sampling randomness. E-CCA retains the efficiency and closed form
computability of CCA, but unlike it, does not possess free parameters which
cannot be inferred directly from data (inherent data dimensionality, and the
number of canonical correlations used for set similarity computation). Our
second major contribution is to show that in contrast to CCA, E-CCA is readily
adapted to match sets in a discriminative learning scheme which we call
discriminative extended canonical correlation analysis (DE-CCA). Theoretical
contributions of this paper are followed by an empirical evaluation of its
premises on the task of face recognition from sets of rasterized appearance
images. The results demonstrate that our approach, E-CCA, already outperforms
both CCA and its quasi-discriminative counterpart constrained CCA (C-CCA), for
all values of their free parameters. An even greater improvement is achieved
with the discriminative variant, DE-CCA.Comment: Machine Learning, 201
A unified framework for thermal face recognition
The reduction of the cost of infrared (IR) cameras in recent years has made IR imaging a highly viable modality for face recognition in practice. A particularly attractive advantage of IR-based over conventional, visible spectrumbased face recognition stems from its invariance to visible illumination. In this paper we argue that the main limitation of previous work on face recognition using IR lies in its ad hoc approach to treating different nuisance factors which affect appearance, prohibiting a unified approach that is capable of handling concurrent changes in multiple (or indeed all) major extrinsic sources of variability, which is needed in practice. We describe the first approach that attempts to achieve this – the framework we propose achieves outstanding recognition performance in the presence of variable (i) pose, (ii) facial expression, (iii) physiological state, (iv) partial occlusion due to eye-wear, and (v) quasi-occlusion due to facial hair growth
GhostVLAD for set-based face recognition
The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and enables efficient similarity computation. Second, we propose a novel GhostVLAD layer that includes ghost clusters, that do not contribute to the aggregation. We show that a quality weighting on the input faces emerges automatically such that informative images contribute more than those with low quality, and that the ghost clusters enhance the network’s ability to deal with poor quality images. Third, we explore how input feature dimension, number of clusters and different training techniques affect the recognition performance. Given this analysis, we train a network that far exceeds the state-of-the-art on the IJB-B face recognition dataset. This is currently one of the most challenging public benchmarks, and we surpass the state-of-the-art on both the identification and verification protocols